GEOG71922: Practical 2: looking at species occurrence and characteristic scale

In practical 2 of GEOG71922 we will cover a range of spatial techniques in R that are central to working with spatial data in ecology. Today you will learn how to: 1) convert tabulated data to spatial point distributions, 2) crop data quickly and neatly to a desired study extent, 3) reclassify a raster for more focussed analysis, 4) provide background (or pseudo-absence) points to your study area, 5) assign and transform spatial projections, 6) use for loops and 7) create your own functions in R.

The main aim of the practical will be to evaluate the characteristic scale which descibes how the occurrence of Sciurus vulgaris (Eurasian Red Squirrel) responds to land cover. We will use occurrence data downloaded from the National Biodiversity Network Atlas [https://nbnatlas.org/] to model the distribution of Sciurus vulgaris (point data uploaded since 2015) and use the UK Land Cover Map (2015) to characterize associated habitat types. We will evaluate this relationship through a general linear model approach at a number of scales using a series of buffers around occurrence data points.

1. Read in and map the data

Copy the data from the “Sciurus” folder in the GEOG71922 area on the shared drive to a folder for week two in your own GEOG71922 area. Open R Studio and create a new script for today’s practical (Prac2.R). Remember how to set the working directory?:

setwd("P:/GEOG71922/Week2")

Install the packages for today’s work:

#Install packages

install.packages(c("terra","sf","mapview"))

Load the libraries that we will need from the start of today’s practical (Note you may recieve some information or warnings in the R console similar to the ones below as produced on my machine, you can ignore these- they just provide information about versions and compatibility):

library(terra)

## terra 1.6.7

library(sf)

## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE

Now read in the spreadsheet containing the coordinates for occurrence locations and some other miscellaneous data columns. Notice that when you read in a .csv file the object automatically takes the class of data.frame.

sciurus<- read.csv("Sciurus.csv")


head(sciurus) # view the first six rows of the data

##                    NBN.Atlas.record.ID  Scientific.name Name.qualifier
## 1 fd44c89f-dc54-4bc8-b242-b25f34714c67 Sciurus vulgaris             NA
## 2 fbcf21ec-d168-4b2b-9644-d798e0d296b8 Sciurus vulgaris             NA
## 3 fbc7523e-c291-497f-8c1f-821e7a715e7a Sciurus vulgaris             NA
## 4 f9dad20d-9ea8-4b58-aafd-cdeee990d703 Sciurus vulgaris             NA
## 5 f55a1c93-8b8d-4f9d-88be-b81561231bc9 Sciurus vulgaris             NA
## 6 e2d32bf2-d58d-4cfc-ae48-ac5668d7842f Sciurus vulgaris             NA
##             Common.name Species.ID..TVK. Taxon.Rank Occurrence.status
## 1 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 2 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 3 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 4 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 5 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 6 Eurasian Red Squirrel NBNSYS0000005108    species           present
##   Start.date.year Locality  OSGR Latitude Longitude Coordinate.uncertainty_m
## 1            2015          NO64G 56.57747  -2.60386                     2000
## 2            2015          SD31R 53.60975  -2.95368                     2000
## 3            2015          NS99Z 56.17291  -3.62832                     2000
## 4            2015          SD31W 53.60999  -2.92346                     2000
## 5            2015          SD31V 53.59201  -2.92306                     2000
## 6            2015          SD30N 53.55558  -2.98265                     2000
##   Identification.verification.status Vitality
## 1                        Unconfirmed         
## 2      Accepted - considered correct         
## 3                        Unconfirmed         
## 4      Accepted - considered correct         
## 5      Accepted - considered correct         
## 6      Accepted - considered correct

#check the class of the loaded .csv file.
class(sciurus)

## [1] "data.frame"

We will do a small amount of data cleaning next, to ensure there are no NA values in our coordinate data (this would cause later analytical steps to crash). Here we use the subset() function and !is.na() to select all rows with complete values for latitude (‘!’ means “is not” in the R language). In addition let’s remove all records with coordinate uncertainty greater than 100 m.

#subset the data to only include points with complete coordinates.
sciurus<-subset(sciurus,!is.na(sciurus$Latitude))

#remove all points with uncertainty > 100m
sciurus<-subset(sciurus, sciurus$Coordinate.uncertainty_m<100)

2. Create a vector layer from coordinates

Next we make a vector layer from our sciurus object. For this we combine two objects, a list of cooordinates (in this case lat and long) and a Proj.4 string. The latter tells R what coordinate system our coordinates are derived from. In this case we are working (for now) in the WGS84 system so we create an object (crs.latlong) to hold a character string that contains the information for this coordinate system. The full string that describes the projection for WGS84 is “+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0”. Fortunately, R allows use to pass the epsg code for WGS84 instead. epsg codes are a kind of shorthand for projection strings and the code for WGS84 is “epsg:4326”. So all we have to do is assign this code to an object (“crs.latlong” below) using the crs() function and tell the SpataialPoints() function that this object contains our projection string.

Note the use of the ‘$’ symbol below for accessing colmuns of dataframes.

#make spatial points layer

#create crs object
sciurus.latlong<-data.frame(x=sciurus$Longitude,y=sciurus$Latitude)



#Use coordinates object to create our spatial points object

sciurus.sp<-vect(sciurus.latlong,geom=c("x","y"))

#check that the points now have our desired crs. 
crs(sciurus.sp)<-"epsg:4326"

Type ?crs() directly into the console and press enter to bring up information about this function in the help panel. The crs() function is very versatile and allows us to get or set the crs of any raster or Spatial object (i.e. any vector object associated with the ‘sp’ library) as well as creating character objects that hold projection strings that can then be applied to any number of objects. The latter is important when you have data without a formal projection (usually from a text or csv file, as in our case) and want to assign one from scratch. The crs() function uses the PROJ.4 library and relies on rgdal for projecting. You can find more information on epsg codes here [http://www.epsg-registry.org/] and a handy overview of using CRS in R here [https://tinyurl.com/y65j2o94]

Now plot our points:

plot(sciurus.sp)

Now let’s have a look at a handy tool in R that allows us to work interactively with spatial data in R - the ‘mapview’ library. This allows use to look at our data and pan around their extent against base maps similar to how you would using ArcGIS. Run the following code then have a play around with the options. When you’re ready, move on to the next step. We will focus our analysis around the Cairngorms area of Scotland, see if you can locate it on the map.

library(mapview)
mapview(st_as_sf(sciurus.sp))#Note: you may receive a warning message about Feature IDs when you run this command but you can ignore this for now).

## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
## multiple of vector length (arg 1)

We have a lot of occurrence data and although it would be nice to consider all of these points, that would be computationally too intensive for our practical session so let’s cut them down to something we can work with by clipping them to a much smaller extent. To do this we use the crop() function and create a new extent:

#First set the exetent to something workable

studyExtent<-c(-4.2,-2.7,56.5,57.5) #list coordinates in the order: min Long, max Long, min Lat, max Lat

#now crop our points to this area

C<-crop(sciurus.sp,studyExtent)

3. Mapping land-cover

Now lets bring in our landcover data to see where our points fit on the map.Y ou will probably receive a warning message on loading the raster layer. This relates to some changes and updates to the Proj library which may not yet be fully implemented in certain packages (such as raster). These deprecated versions still work fine for now so this is nothing to worry about.

#read in our raster data

LCM<-rast("LCMUK.tif")

The raster is set to OSGB, so our two data sets have different projections. Let’s fix that with the project() function. Here we create an object containing the crs information for LCM inside the project() function and pass it to our points layer:

sciurusFin<-project(C,crs(LCM))

Note that above we did not need to use any character strings to define our projection.This is down to the versatility of the crs() function whereby, in this case, we can simply pass the crs of the LCM layer directly to the spTransform function.

Before we continue, let’s crop the land cover data to the extent of our points data (plus 5km to allow space for the buffers we will create in subsequent steps).

sciurusCoords<-crds(sciurusFin)




x.min <- min(sciurusCoords[,1]) - 5000
x.max <- max(sciurusCoords[,1]) + 5000
y.min <- min(sciurusCoords[,2]) - 5000
y.max <- max(sciurusCoords[,2]) + 5000

extent.new <- ext(x.min, x.max, y.min, y.max)



LCM <- crop(LCM$LCMUK_1, extent.new)

plot(LCM)

plot(sciurusFin,add=TRUE)

Now we are ready to start characterizing the locations of our points with the LCM layer.

4. Generating (pseudo-)absence points

Our analysis will be based on attempting to predict the locations of our points relative to the underlying habitat. In order to do this however we want also to characterize areas where Sciurus vulgaris doesn’t occur. We could try without absence data (and some algorithms such as Maxent and Bioclim are built for this) but it is more robust to provide absence as well as presence data in our modelling. Since we don’t have formal absence point locations, we will simulate our own by creating random background points (also called pseudo-absence data). For this we will use the spatSample() function.This function creates a matrix of random coordinates which we can then convert to a vector layer.

Note that in practice we might create many more background points to get a representative sample but we will settle for 1000 for this practical to reduce processing time later on.

set.seed(11)

back.xy <- spatSample(LCM, size=1000,as.points=TRUE) 

#create a spatialPoints layer from the back.xy matrix

plot.new()

plot(LCM)

plot(sciurusFin,add=T)

plot(back.xy,add=TRUE, col='red')

5. Characterizing point locations

Now we will use some functions that we learned last week to extract land cover values to our points and plot them to compare presence and absence locations.

eA<-extract(LCM,back.xy)

eP<-extract(LCM,sciurusFin)

A very useful function for cross-tabulating data (i.e. summing counts per category of interest) is the table() function. Let’s use this to tabulate counts per landcover type for our presence and absence points.

#calculate point frequency per landcover category (headings are landcover category codes - see LCM documentation for descriptions)
table(eA[,2])

## 
##   0   1   2   3   4   7   9  10  11  12  14  16  20  21 
##   2  39 129 101 122 129 328  53  26  48  13   1   2   7

table(eP[,2])

## 
##   1   2   3   4   7   9  10  11  14  21 
## 101  87  26  45  28   1   1   1   1  25

Use the range of values provided by table(eA) and table(eP) to set the x axes of two histograms comparing presence and absence locations. Note ylim is will be 1 on this occasion as we will set the argument ‘freq’ in the hist() function to FALSE so that the values on the y axis of our histograms represent proportions rather than counts (this is important as the number of points in our presence and absence data are markedly different).

par(mfrow=c(1,2))

hist(eA[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1))




hist(eP[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1))

The max value on our y axis seems to be 0.4, so change ylim to ylim=c(0,0.5) for both histograms and run again (remember the c() function for combining elements - here min and max values for our axes).

hist(eA[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.4))

hist(eP[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.5))

From these outputs we can see that the proportion of points occuring in broadleaf woodland (class 1 in the UK Landcover Map - see documentation) is much higher for presence relative to background points. The focus for the rest of the practical will be determining to what extent this response to broadleaf woodland differs as a function of scale and, if it does, what is the characteristic scale of this effect (so we can operationalize this optimum scale in future analyses, for example, for species distribution modelling).

Next let’s create a single data frame to hold both our presence and absence data so that we can start to carry out our scale analysis on the one dataset.

Abs<-data.frame(crds(back.xy),Pres=0)

head(Abs)

##          x        y Pres
## 1 357087.5 747287.5    0
## 2 288062.5 774962.5    0
## 3 319062.5 791362.5    0
## 4 284137.5 836537.5    0
## 5 286387.5 762612.5    0
## 6 292987.5 825812.5    0

Pres<-data.frame(crds(sciurusFin),Pres=1)


head(Pres)

##          x        y Pres
## 1 302144.9 826414.9    1
## 2 348875.4 780665.7    1
## 3 313612.9 791182.1    1
## 4 302332.6 820623.6    1
## 5 291470.2 822623.9    1
## 6 293429.5 818942.5    1

sciurusData<-rbind(Pres,Abs) # bind the two data frames by row (both dataframes have the same column headings)

#inspect
head(sciurusData)

##          x        y Pres
## 1 302144.9 826414.9    1
## 2 348875.4 780665.7    1
## 3 313612.9 791182.1    1
## 4 302332.6 820623.6    1
## 5 291470.2 822623.9    1
## 6 293429.5 818942.5    1

tail(sciurusData)

##             x        y Pres
## 1311 346362.5 779337.5    0
## 1312 322587.5 779687.5    0
## 1313 273387.5 836312.5    0
## 1314 306137.5 849837.5    0
## 1315 271637.5 803512.5    0
## 1316 313462.5 802687.5    0

class(sciurusData)

## [1] "data.frame"

6. Re-classifying the raster

In order to explore how occurrence responds to broadleaf forest cover, we first need to extract a layer from our LCM raster consisting only of cells assigned to this land cover type in the raster (classification ‘1’). we can do this through a simple reclassification of the raster. First we convert the values of the raster layers to factors. In R factors represent categories (categorical data). You will hopefully have realized that our land cover map is a categorical rather than continuous raster, meaning that the values represent land cover types (rather than some continous variable such as elevation or slope - we’ll look at these later in the course).

#access levels of the raster by treating them as categorical data ('factors' in R)

LCM<-as.factor(LCM)

levels(LCM)

## [[1]]
##    ID label
## 1   0     0
## 2   1     1
## 3   2     2
## 4   3     3
## 5   4     4
## 6   5     5
## 7   7     7
## 8   8     8
## 9   9     9
## 10 10    10
## 11 11    11
## 12 12    12
## 13 13    13
## 14 14    14
## 15 15    15
## 16 16    16
## 17 17    17
## 18 18    18
## 19 19    19
## 20 20    20
## 21 21    21

Looking at this list we can create a binary raster (assigning 1 for broadleaf woodland - the second element in the list - and 0 for everything else).

To reclassify a raster we create a two-column matrix where the first column contains the original cell values and the second column contains our new values (this is similar to how the reclassify function works in ArcGIS only here we provide the matrix ourselves).

#create an vector object called reclass
reclass <- c(0,1,rep(0,19))

# combine with the LCM categories into a matrix of old and new values.
RCmatrix<- cbind(levels(LCM)[[1]],reclass)
RCmatrix<-RCmatrix[,2:3]#we only need columns 2 and 3

#Use the reclassify() function to asssign new values to LCM with our reclassification matrix
broadleaf <- classify(LCM, RCmatrix)

Inspect the new data visually:

plot(broadleaf)

plot(sciurusFin,add=TRUE)

7. Buffer Analysis

Before we create a function to implement our analysis for all points, let’s first work with the first point (the first row of our sciurusFin SpatialPoints layer) to understand the buffering process and visualize the result.

buf5km<-5000 #create an object to hold the distance parameter for our buffer


buffer.site1.5km<-buffer(sciurusFin[1,],width=buf5km) # use the buffer() function from the raster package applied to the first item of sciurusFin

Now use the zoom() function for a close up look at our buffered area (this will open in a separate window).

zoom(broadleaf,buffer.site1.5km) # use the zoom() function for a close-up of the result.

plot(buffer.site1.5km,border="red",lwd=5,add=T) # add the buffer



dev.off() # enter this to clear the active graphics pane

Now that we have our buffer we’ll to calculate the percentage cover by broadleaf woodland inside it.

To do so for a buffer width of 5km around our first site as an example we need to create the following objects:

an raster object for the broadleaf woodland inside each buffer (bufferlandcover5km below)
a numeric object holding the area described by each cell in our raster(grainarea below),
a numeric to hold the buffer area for a given radius in ha (bufferArea)
a numeric to hold the total area of woodland inside the buffer (landcover5km)
an object containing the final result as a percentage of the buffered area (percentlandcover5km )

Note that we use the crop() and mask() functions here. The first will clip our data to the extent (i.e. a bounding box delineated by the min and max x,y coordinates) of another object and the second clips to the mask (think of this as the outline) of another object. We could just use mask() directly, but this would take more time so putting both one after the other speeds up the analysis a little.

Also note the use of the res() function to access the resolution (size) of the LCM raster cells (again, you can use ?function() in the console to get more information).

buffer5km <- crop(broadleaf, buffer.site1.5km)#crop the broadleaf layer to the extent of the buffer 
bufferlandcover5km <- mask(broadleaf, buffer.site1.5km)#clip the above again to the circle described by the buffer (doing this speeds up the process compared to using only the mask() function)


bufferArea <- (3.14159*buf5km^2) #calculate the area of the buffer according to the buffer width
landcover5km <- sum(values(bufferlandcover5km),na.rm=T)*625 #total area of woodland inside the buffer (625 is the area in metres of each cell of our 25m raster) na.rm=T makes sure any NA values are removed from the calculation (otherwise NA is returned) 

percentlandcover5km <- landcover5km/bufferArea*100 #calculate percentage

#return the result
percentlandcover5km

## [1] 9.082184

Now that we are happy with the work flow above, let’s create a function to automate the application of these steps to all rows in our sciurusData data frame.

Creating your own function in R is easy. We simply need to assign a name to our function (we’ll call it ‘landBuffer’), use ‘function()’ to list the arguments that the function requires, then place the associated code (which will produce the objects we just created in the example above) inside two braces {}.

Let’s set an example (we haven’t looked at the abs() function yet,a common term found in many statistical packages - can you work out what it does?).

#Create two numerical vectors to work with
v1<-c(1,2,3,4,5,6)

v2<-c(6,7,8,9,10,11)

#create a function called meanDiff() that will take two objects (vectors), calculate the mean of both and reurn the difference between the mean values:

meanDiff<-function(A,B){
  c<-abs(mean(A)-mean(B))
  print(c)
}

meanDiff(v1,v2)

## [1] 5

A function to automate our buffer calculations:

#function for automating whole dataset. The function is set up take two arguments: a list of coordinates and a buffer width.

landBuffer <- function(coords, size){         
  coordinates<-cbind(coords[i,1],coords[i,2])
  sciurusPoints <- vect(coordinates,geom=c("x","y"))    # spatial points from coordinates  
  sciurusBuffer <- buffer(sciurusPoints, width=size)                     #buffer each point
  bufferlandcover <- crop(woodland, sciurusBuffer)              #crop the woodland layer to the buffer extent
  masklandcover <- mask(bufferlandcover, sciurusBuffer)        # mask the above to the buffer
  landcoverArea <- sum(values(masklandcover),na.rm = TRUE)*625    #use cellStats() to sum landcover area (625 is the area in metres of each cell of our 25m raster)
  percentcover <- landcoverArea/gArea(buffer)*100             # convert to precentage cover (we use the gArea() function from the rgeos package to get the area of our buffer)
  return(percentcover)                                        # return the result
}

Now we have our function it just remains to apply this function to every row in our sciurusData data frame for a series of buffer distances. We do this with a for loop that will iterate our function over each row in the SciurusData data frame and apply our function to each set of coordinates held in the row (remember the first two columns of our sciursData object are the XY coordinates).

First we create empty vectors for each distance parameter that will hold the result for the corresponding iteration of the landBuffer() function. We will experiment with buffer distances of 100-2000m.

Before we do that, let’s have a look at how for loops work in R. They are called in a similar way as calling any other function and take the format:

for(value in the sequence){ statement }

Try the below as an example:

#example for loop

#Create a vector of values
v1<-c(1,2,3,4,5,6,7,8,9,10,11)

#create a new empty vector to store the results of our for loop

x<-rep(NA,length=length(v1))

for (i in v1){
  x[i]<-(sum(v1)-i)
}

print(x)

##  [1] 65 64 63 62 61 60 59 58 57 56 55

In our case the sequence of values is the rows (‘nrow’) of our sciurusData data frame and the sequence of buffer radii we wish to try. The statement is the application of our function for each of the distances. Note: once run this loop may take several minutes to complete so please be patient!

We set the sequence of radii to provide by using the seq() function:

radii<-seq(100,2000,by=100)

Importantly, we need a place to hold the results for i rows of coordinate and j radii. We’ll just call this ‘res’

res <- matrix(NA, nrow = nrow(sciurusData), ncol = length(radii))
head(res)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]    NA    NA    NA    NA    NA    NA
## [2,]    NA    NA    NA    NA    NA    NA
## [3,]    NA    NA    NA    NA    NA    NA
## [4,]    NA    NA    NA    NA    NA    NA
## [5,]    NA    NA    NA    NA    NA    NA
## [6,]    NA    NA    NA    NA    NA    NA

Now we modify the function we made earlier to make it specific to our data so that for each row i we apply the buffer j (i.e. each number in the object ‘radii’). This time the landBuffer function will only have one argument, i, as we will enter the chosen radii j directly into the main body of the landBuffer function (as the second argument of the buffer() function). I have left a line above return(percentcover)} below. If you type “print(i)” (without the speech marks) in this empty line then the function will print each iteration of i to give you some idea of progress (there are 1316 rows in sciurusData).

landBuffer <- function(i){         
  
  
  SciurusPoints<-vect(sciurusData[i,],geom=c("x","y"),crs="epsg:27700")  
  sciurusBuffer <- buffer(SciurusPoints, width=radii[j])           #buffer each point
  bufferlandcover <- crop(broadleaf, sciurusBuffer)                 #crop the landcover layer to the buffer extent
  masklandcover <- mask(bufferlandcover, sciurusBuffer)            # mask the above to the buffer
  landcoverArea <- sum(values(masklandcover),na.rm = TRUE)*625      #use cellStats() to sum landcover area
  percentcover <- landcoverArea/expanse(sciurusBuffer)*100           # convert to precentage cover
  
  return(percentcover)} #get the result

Now run a for loop that iterates over each row (coordinates) in sciurusData and applies each buffer size (by using the seq_along() function), returning the results to our ‘res’ matrix. The matrix should be set up so that there are enough rows for each iteration of j - so when each row in a column is filled R will start to fill the next column. This way the results from our different buffer sizes should be separated into unique columns.

for(i in 1:nrow(sciurusData)){

  for(j in seq_along(radii)){
    
    res[i,j]<-landBuffer(i)}
  
  }

Take a look at the result in the ‘res’ matrix object.

#inspect
head(res)

##          [,1]     [,2]      [,3]      [,4]        [,5]      [,6]      [,7]
## [1,] 57.90234 35.93939 26.399919 25.581855 21.56363169 21.630186 21.433239
## [2,]  0.00000  0.00000  0.000000  0.000000  0.07985165  1.164503  1.670366
## [3,] 45.92017 32.44360 22.627329 18.218327 17.17014956 13.476571 12.345884
## [4,] 29.94946 22.96125 14.641959 11.480627  8.62544496  9.206686  8.679232
## [5,]  0.00000  0.00000  5.102782  4.867055  3.27465459  2.717786  3.178485
## [6,]  0.00000  0.00000  0.000000  0.000000  0.00000000  0.000000  0.000000
##           [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
## [1,] 20.590273 20.829068 22.102723 23.382053 24.708328 26.133439 26.017345
## [2,]  1.996291  2.858886  3.273917  3.349150  3.216247  3.189341  3.330547
## [3,] 12.665481 12.299605 11.380215 10.461152  9.719214  9.391956  8.495435
## [4,]  7.705747  6.975883  6.928309  7.078964  6.530647  6.497911  6.611293
## [5,]  3.712690  5.916268  7.607582  8.680046  8.181090  7.632511  6.886718
## [6,]  0.000000  1.257194  2.875277  4.802029  8.291934 11.460464 13.722356
##          [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
## [1,] 25.654510 25.527260 24.940636 24.021216 23.024880 22.422184
## [2,]  3.584452  3.649470  3.688649  3.721481  3.765857  3.578352
## [3,]  7.764280  7.065841  6.417908  5.829372  5.237432  4.791670
## [4,]  7.569449  7.643352  7.647994  8.041985  8.257534  7.896675
## [5,]  6.584807  6.528406  6.432407  6.150454  5.945973  5.590874
## [6,] 16.204492 18.781647 20.561379 21.606482 21.875423 21.494691

Now convert the values to a data frame so that we can do some analysis on these results.

#create data frame from the res matrix so we can do some statistics

glmData<-data.frame(res)
colnames(glmData)<-c("w100","w200","w300","w400","w500","w600","w700","w800","w900","w1000","w1100","w1200","w1300","w1400","w1500","w1600","w1700","w1800","w1900","w2000")


head(glmData)

##       w100     w200      w300      w400        w500      w600      w700
## 1 57.90234 35.93939 26.399919 25.581855 21.56363169 21.630186 21.433239
## 2  0.00000  0.00000  0.000000  0.000000  0.07985165  1.164503  1.670366
## 3 45.92017 32.44360 22.627329 18.218327 17.17014956 13.476571 12.345884
## 4 29.94946 22.96125 14.641959 11.480627  8.62544496  9.206686  8.679232
## 5  0.00000  0.00000  5.102782  4.867055  3.27465459  2.717786  3.178485
## 6  0.00000  0.00000  0.000000  0.000000  0.00000000  0.000000  0.000000
##        w800      w900     w1000     w1100     w1200     w1300     w1400
## 1 20.590273 20.829068 22.102723 23.382053 24.708328 26.133439 26.017345
## 2  1.996291  2.858886  3.273917  3.349150  3.216247  3.189341  3.330547
## 3 12.665481 12.299605 11.380215 10.461152  9.719214  9.391956  8.495435
## 4  7.705747  6.975883  6.928309  7.078964  6.530647  6.497911  6.611293
## 5  3.712690  5.916268  7.607582  8.680046  8.181090  7.632511  6.886718
## 6  0.000000  1.257194  2.875277  4.802029  8.291934 11.460464 13.722356
##       w1500     w1600     w1700     w1800     w1900     w2000
## 1 25.654510 25.527260 24.940636 24.021216 23.024880 22.422184
## 2  3.584452  3.649470  3.688649  3.721481  3.765857  3.578352
## 3  7.764280  7.065841  6.417908  5.829372  5.237432  4.791670
## 4  7.569449  7.643352  7.647994  8.041985  8.257534  7.896675
## 5  6.584807  6.528406  6.432407  6.150454  5.945973  5.590874
## 6 16.204492 18.781647 20.561379 21.606482 21.875423 21.494691

#add in the presences data
glmData$Pres<-sciurusData$Pres


head(glmData)

##       w100     w200      w300      w400        w500      w600      w700
## 1 57.90234 35.93939 26.399919 25.581855 21.56363169 21.630186 21.433239
## 2  0.00000  0.00000  0.000000  0.000000  0.07985165  1.164503  1.670366
## 3 45.92017 32.44360 22.627329 18.218327 17.17014956 13.476571 12.345884
## 4 29.94946 22.96125 14.641959 11.480627  8.62544496  9.206686  8.679232
## 5  0.00000  0.00000  5.102782  4.867055  3.27465459  2.717786  3.178485
## 6  0.00000  0.00000  0.000000  0.000000  0.00000000  0.000000  0.000000
##        w800      w900     w1000     w1100     w1200     w1300     w1400
## 1 20.590273 20.829068 22.102723 23.382053 24.708328 26.133439 26.017345
## 2  1.996291  2.858886  3.273917  3.349150  3.216247  3.189341  3.330547
## 3 12.665481 12.299605 11.380215 10.461152  9.719214  9.391956  8.495435
## 4  7.705747  6.975883  6.928309  7.078964  6.530647  6.497911  6.611293
## 5  3.712690  5.916268  7.607582  8.680046  8.181090  7.632511  6.886718
## 6  0.000000  1.257194  2.875277  4.802029  8.291934 11.460464 13.722356
##       w1500     w1600     w1700     w1800     w1900     w2000 Pres
## 1 25.654510 25.527260 24.940636 24.021216 23.024880 22.422184    1
## 2  3.584452  3.649470  3.688649  3.721481  3.765857  3.578352    1
## 3  7.764280  7.065841  6.417908  5.829372  5.237432  4.791670    1
## 4  7.569449  7.643352  7.647994  8.041985  8.257534  7.896675    1
## 5  6.584807  6.528406  6.432407  6.150454  5.945973  5.590874    1
## 6 16.204492 18.781647 20.561379 21.606482 21.875423 21.494691    1

Use these data to model Sciurus vulgaris occurrence as a function of forest cover at each scale with the general linear model function: glm(). In glm() we state the dependent variable as the first argument followed by tilda (~), we give “binomial” to indicate our dependent variable is binary (i.e. 0 and 1)

#make a list of glms (general linear models) to test the relationship between broadleaf cover and presence for the different buffer sizes

glm100<-glm(Pres~w100,family = "binomial",data = glmData)
glm200<-glm(Pres~w200,family = "binomial",data = glmData)
glm300<-glm(Pres~w300,family = "binomial",data = glmData)
glm400<-glm(Pres~w400,family = "binomial",data = glmData)
glm500<-glm(Pres~w500,family = "binomial",data = glmData)
glm600<-glm(Pres~w600,family = "binomial",data = glmData)
glm700<-glm(Pres~w700,family = "binomial",data = glmData)
glm800<-glm(Pres~w800,family = "binomial",data = glmData)
glm900<-glm(Pres~w900,family = "binomial",data = glmData)
glm1000<-glm(Pres~w1000,family = "binomial",data = glmData)
glm1100<-glm(Pres~w1100,family = "binomial",data = glmData)
glm1200<-glm(Pres~w1200,family = "binomial",data = glmData)
glm1300<-glm(Pres~w1300,family = "binomial",data = glmData)
glm1400<-glm(Pres~w1400,family = "binomial",data = glmData)
glm1500<-glm(Pres~w1500,family = "binomial",data = glmData)
glm1600<-glm(Pres~w1600,family = "binomial",data = glmData)
glm1700<-glm(Pres~w1700,family = "binomial",data = glmData)
glm1800<-glm(Pres~w1800,family = "binomial",data = glmData)
glm1900<-glm(Pres~w1900,family = "binomial",data = glmData)
glm2000<-glm(Pres~w2000,family = "binomial",data = glmData)

Now take all the logliklihoosd statistics and combine into a new data frame so we can plot the effect of scale on our modelling.

#make a list of loglikelihood values

ll100<-as.numeric(logLik(glm100))
ll200<-as.numeric(logLik(glm200))
ll300<-as.numeric(logLik(glm300))
ll400<-as.numeric(logLik(glm400))
ll500<-as.numeric(logLik(glm500))
ll600<-as.numeric(logLik(glm600))
ll700<-as.numeric(logLik(glm700))
ll800<-as.numeric(logLik(glm800))
ll900<-as.numeric(logLik(glm900))
ll1000<-as.numeric(logLik(glm1000))
ll1100<-as.numeric(logLik(glm1100))
ll1200<-as.numeric(logLik(glm1200))
ll1300<-as.numeric(logLik(glm1300))
ll1300<-as.numeric(logLik(glm1300))
ll1400<-as.numeric(logLik(glm1400))
ll1500<-as.numeric(logLik(glm1500))
ll1600<-as.numeric(logLik(glm1600))
ll1700<-as.numeric(logLik(glm1700))
ll1800<-as.numeric(logLik(glm1800))
ll1900<-as.numeric(logLik(glm1900))
ll2000<-as.numeric(logLik(glm2000))

#combine into a vector
ll<-c(ll100,ll200,ll300,ll400,ll500,ll600,ll700,ll800,ll900,ll1000,ll1100,ll1200,ll1300,ll1400,ll1500,ll1600,ll1700,ll1800,ll1900,ll2000)

#do the same for our distance values
Dist<-c(100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000)

#make a dataframe
glmRes<-data.frame(cbind(Dist,ll))


#and plot

plot(glmRes$Dist, glmRes$ll, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "buffer", ylab = "logLik")

#now, by finding the max loglikelihood we can determine the optimum buffer size

opt<-glmRes[which(glmRes$ll==max(glmRes$ll)),]

#print
opt

##    Dist        ll
## 18 1800 -581.4261

Our results give us some confidence in the characteristic scales that we need to consider for the response of red squirrel to broadleaf woodland. Our data suggests a bimodal response with a strong signal at 100m and another at 1800m.

You’ve reached the end of this week’s practical. Please continue to the exercises below.

Week Two independent exercises

Open a new script and execute the following in R.

1.Calculate the characteristic scale(s) of the response by the Eurasian badger (Meles meles) to broadleaf tree cover for buffer distances of between 100 and 2000m in 100m intervals. Using the log likelihood statistic for model selection, calculate the optimum scale for modelling badger response to broadleaf tree cover for the same extent as that used in today’s practical. Use the occurence data in the “Melesmeles.csv” file in the GEOG71922 shared drive (or dropbox). Remove any records that are “unconfirmed” (under the “Identification verification status” field in the csv file) and any points with coordinate uncertainty > 1000m. Produce plots showing the relationship between buffer radii and your model performance (loglikelihood statistic).

Write a 300 word summary of the importance of scale in modelling species-habitat relationships. Draw on ideas covered in the lectures in Weeks 1 and 2 and the suggested reading, relating these to your results (from the practical and exercise one).

GEOG71922: Practical 2: looking at species occurrence and characteristic scale

MD

03/02/2023