GEOG71922: Week One: looking at species-environment relationships and characteristic scale

Author

MD 29/01/2026

In Practical One of GEOG71922 we will cover a range of spatial techniques in R that are central to working with spatial data in ecology. Today you will learn how to: 1) convert tabulated data to spatial point distributions, 2) crop data quickly and neatly to a desired study extent, 3) reclassify a raster for more focussed analysis, 4) provide background (or pseudo-absence) points to your study area, 5) assign and transform spatial projections, 6) use for loops and 7) create your own functions in R.

The main aim of the practical will be to evaluate the characteristic scale which describes how the occurrence of Sciurus vulgaris (Eurasian Red Squirrel) responds to land cover. We will use occurrence data downloaded from the National Biodiversity Network Atlas [https://nbnatlas.org/] to model the distribution of Sciurus vulgaris (point data uploaded since 2015) and use the UK Land Cover Map (2015) to characterize associated habitat types. We will evaluate this relationship through a generalised linear model approach at a number of scales using a series of buffers around occurrence data points.

Read in and map the data

Copy the data from the “Sciurus” folder in the GEOG71922 area on the shared drive to a folder for Week One in your own GEOG71922 directory. Open R Studio and create a new script for today’s practical (Prac1.R). Remember how to set the working directory from the week zero exercise?:


setwd("./GEOG71922/Week2")

Install the packages for today’s work:


#Install packages

install.packages(c("terra","sf","mapview"))

Load the libraries that we will need from the start of today’s practical (Note you may recieve some information or warnings in the R console similar to the ones below as produced on my machine, you can ignore these- they just provide information about versions and compatibility):

library(terra)

Warning: package 'terra' was built under R version 4.5.2

terra 1.8.93

library(sf)

Warning: package 'sf' was built under R version 4.5.2

Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

Now read in the spreadsheet containing the coordinates for occurrence locations and some other miscellaneous data columns. Notice that when you read in a .csv file the object automatically takes the class of data.frame.

sciurus<- read.csv("Sciurus.csv")

# view the first six rows of the data
head(sciurus)

                   NBN.Atlas.record.ID  Scientific.name Name.qualifier
1 fd44c89f-dc54-4bc8-b242-b25f34714c67 Sciurus vulgaris             NA
2 fbcf21ec-d168-4b2b-9644-d798e0d296b8 Sciurus vulgaris             NA
3 fbc7523e-c291-497f-8c1f-821e7a715e7a Sciurus vulgaris             NA
4 f9dad20d-9ea8-4b58-aafd-cdeee990d703 Sciurus vulgaris             NA
5 f55a1c93-8b8d-4f9d-88be-b81561231bc9 Sciurus vulgaris             NA
6 e2d32bf2-d58d-4cfc-ae48-ac5668d7842f Sciurus vulgaris             NA
            Common.name Species.ID..TVK. Taxon.Rank Occurrence.status
1 Eurasian Red Squirrel NBNSYS0000005108    species           present
2 Eurasian Red Squirrel NBNSYS0000005108    species           present
3 Eurasian Red Squirrel NBNSYS0000005108    species           present
4 Eurasian Red Squirrel NBNSYS0000005108    species           present
5 Eurasian Red Squirrel NBNSYS0000005108    species           present
6 Eurasian Red Squirrel NBNSYS0000005108    species           present
  Start.date.year Locality  OSGR Latitude Longitude Coordinate.uncertainty_m
1            2015          NO64G 56.57747  -2.60386                     2000
2            2015          SD31R 53.60975  -2.95368                     2000
3            2015          NS99Z 56.17291  -3.62832                     2000
4            2015          SD31W 53.60999  -2.92346                     2000
5            2015          SD31V 53.59201  -2.92306                     2000
6            2015          SD30N 53.55558  -2.98265                     2000
  Identification.verification.status Vitality
1                        Unconfirmed         
2      Accepted - considered correct         
3                        Unconfirmed         
4      Accepted - considered correct         
5      Accepted - considered correct         
6      Accepted - considered correct

#check the class of the loaded .csv file.
class(sciurus)

[1] "data.frame"

We will do a small amount of data cleaning next, to ensure there are no NA values in our coordinate data (this would cause later analytical steps to crash). Here subset using the square braces [] approach and the function !is.na() to select all rows with complete values for latitude (‘!’ means “is not” in the R language). In addition let’s remove all records with coordinate uncertainty greater than 100 m.

#subset the data to only include points with complete coordinates. Remember that using [x,] will subset the "x" rows of the object (accessed through the left-hand send of the comma) and select all columns (by leaving the right-hand side of the comma blank).

sciurus<-sciurus[!is.na(sciurus$Latitude),]

#remove all points with uncertainty > 100m
sciurus<-sciurus[sciurus$Coordinate.uncertainty_m<100,]

Create a vector layer from coordinates

Next we make a vector layer from our sciurus object.

To start with we will use the terra library for all our spatial operations (because terra handles both vector and raster objects in a easy, compatible way). Once we have done the basic spatial data processing though we will use the sf library for analysis. I generally use sf wherever possible as it is a highly intuitive format where spatial data are treated simply as data frames with an extra column for coordinates (this is basically all that differentiates GIS from other kinds of analysis, so I like that this is made explicit in sf).

For the first step in getting the spatial data set up, we combine two objects, a list of cooordinates (in this case lat and long) and a Proj.4 string. The latter tells R what coordinate system our coordinates are derived from. In this case we are working (for now) in the WGS84 system so we create an object (crs.latlong) to hold a character string that contains the information for this coordinate system. The full string that describes the projection for WGS84 is “+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0”. Fortunately, R allows use to pass the epsg code for WGS84 instead. epsg codes are a kind of shorthand for projection strings and the code for WGS84 is “epsg:4326”. So all we have to do is assign this code to an object (“crs.latlong” below) using the crs() function and tell the SpataialPoints() function that this object contains our projection string.

Note the use of the ‘$’ symbol below for accessing colmuns of dataframes.

#make spatial points layer

#create crs object
sciurus.latlong<-data.frame(x=sciurus$Longitude,y=sciurus$Latitude)



#Use coordinates object to create our spatial points object

sciurus.sp<-vect(sciurus.latlong,geom=c("x","y"))

#check that the points now have our desired crs. 
crs(sciurus.sp)<-"epsg:4326"

Type ?crs directly into the console and press enter to bring up information about this function in the help panel. The crs() function is very versatile and allows us to get or set the crs of any raster or Spatial object (i.e. any vector object associated with the ‘sp’ library) as well as creating character objects that hold projection strings that can then be applied to any number of objects. The latter is important when you have data without a formal projection (usually from a text or csv file, as in our case) and want to assign one from scratch. The crs() function uses the PROJ.4 library and relies on rgdal for projecting. You can find more information on epsg codes here and a handy overview of using CRS in R here

Now plot our points:

plot(sciurus.sp)

Now let’s have a look at a handy tool in R that allows us to work interactively with spatial data in R - the ‘mapview’ library. This allows use to look at our data and pan around their extent against base maps similar to how you would using QGIS or ArcGIS. Run the following code then have a play around with the options. When you’re ready, move on to the next step. We will focus our analysis around the Cairngorms area of Scotland, see if you can locate it on the map.

#Note: you may receive a warning message about Feature IDs when you run this command but you can ignore this for now).

library(mapview)
mapview(st_as_sf(sciurus.sp))

Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
multiple of vector length (arg 1)

We have a lot of occurrence data and, although it would be nice to consider all of these points, that would be computationally too intensive for our practical session so let’s cut them down to something we can work with by clipping them to a much smaller extent. To do this we use the crop() function and create a new extent:

#First set the exetent to something workable

studyExtent<-c(-4.2,-2.7,56.5,57.5) #list coordinates in the order: min x, max x, min y, max y

#now crop our points to this area

C<-crop(sciurus.sp,studyExtent)

Mapping land-cover

Now lets bring in our land cover data to see where our points fit on the map using the rast() runction.

#read in our raster data

LCM=rast("LCMUK.tif")

The raster is set to OSGB, so our two data sets have different projections. Let’s fix that with the project() function. Here we create an object containing the crs information for LCM inside the project() function and pass it to our points layer:

sciurusFin<-project(C,crs(LCM))

Note that above we did not need to use any character strings to define our projection.This is down to the versatility of the crs() function whereby, in this case, we can simply pass the crs of the LCM layer directly to the spTransform function.

Before we continue, let’s crop the land cover data to the extent of our points data (plus 5km to allow space for the buffers we will create in subsequent steps).

sciurusCoords<-crds(sciurusFin)


x.min <- min(sciurusCoords[,1]) - 5000
x.max <- max(sciurusCoords[,1]) + 5000
y.min <- min(sciurusCoords[,2]) - 5000
y.max <- max(sciurusCoords[,2]) + 5000

extent.new <- ext(x.min, x.max, y.min, y.max)


LCM <- crop(LCM$LCMUK_1, extent.new)

plot(LCM)

plot(sciurusFin,add=TRUE)

Now we are ready to start characterizing the locations of our points with the LCM layer.

Generating (pseudo-)absence points

Our analysis will be based on attempting to predict the locations of our points relative to the underlying habitat. In order to do this however we want also to characterize areas where Sciurus vulgaris doesn’t occur. We could try without absence data (and some algorithms such as Maxent and Bioclim are built for this) but it is more robust to provide absence as well as presence data in our modelling. Since we don’t have formal absence point locations, we will simulate our own by creating random background points (also called pseudo-absence data). For this we will use the spatSample() function (which is one of the handier functions in terra that doesn’t have a better equivalent, in my view, in sf).This function creates a matrix of random coordinates which we can then convert to a vector layer.

Note that in practice we might create many more background points to get a representative sample but we will settle for 1000 for this practical to reduce processing time later on.

set.seed(11)

back.xy <- spatSample(LCM, size=1000,as.points=TRUE) 

#create a spatialPoints layer from the back.xy matrix

plot.new()

plot(LCM)

plot(sciurusFin,add=T)

plot(back.xy,add=TRUE, col='red')

Characterizing point locations

Now we will use some functions that we learned in the introductory practical to extract land cover values to our points and plot them to compare presence and absence locations.

eA<-extract(LCM,back.xy)

eP<-extract(LCM,sciurusFin)

A very useful function for cross-tabulating data (i.e. summing counts per category of interest) is the table() function. Let’s use this to tabulate counts per landcover type for our presence and absence points.

#calculate point frequency per landcover category (headings are landcover category codes - see LCM documentation for descriptions)
table(eA[,2])


  0   1   2   3   4   7   9  10  11  12  14  16  20  21 
  2  39 129 101 122 129 328  53  26  48  13   1   2   7

table(eP[,2])


  1   2   3   4   7   9  10  11  14  21 
102  88  23  48  27   1   1   1   1  24

Use the range of values provided by table(eA) and table(eP) to set the x axes of two histograms comparing presence and absence locations. Note ylim is will be 1 on this occasion as we will set the argument ‘freq’ in the hist() function to FALSE so that the values on the y axis of our histograms represent proportions rather than counts (this is important as the number of points in our presence and absence data are markedly different).

par(mfrow=c(1,2))

hist(eA[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1))




hist(eP[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1))

The max value on our y axis seems to be 0.4, so change ylim to ylim=c(0,0.5) for both histograms and run again (remember the c() function for combining elements - here min and max values for our axes).

hist(eA[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.4))

hist(eP[,2],freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.5))

From these outputs we can see that the proportion of points occuring in broadleaf woodland (class 1 in the UK Landcover Map - see documentation) is much higher for presence relative to background points. The focus for the rest of the practical will be determining to what extent this response to broadleaf woodland differs as a function of scale and, if it does, what is the characteristic scale of this effect (so we can operationalize this optimum scale in future analyses, for example, for species distribution modelling).

Next, let’s create a single data frame to hold both our presence and absence data so that we can start to carry out our scale analysis on the one dataset.

Abs<-data.frame(crds(back.xy),Pres=0)

head(Abs)

         x        y Pres
1 357087.5 747287.5    0
2 288062.5 774962.5    0
3 319062.5 791362.5    0
4 284137.5 836537.5    0
5 286387.5 762612.5    0
6 292987.5 825812.5    0

Pres<-data.frame(crds(sciurusFin),Pres=1)


head(Pres)

         x        y Pres
1 302144.9 826416.5    1
2 348875.5 780667.0    1
3 313613.0 791183.4    1
4 302332.6 820625.2    1
5 291470.3 822625.6    1
6 293429.5 818944.1    1

 # bind the two data frames by row (both dataframes have the same column headings)
sciurusData<-rbind(Pres,Abs)

#inspect
head(sciurusData)

         x        y Pres
1 302144.9 826416.5    1
2 348875.5 780667.0    1
3 313613.0 791183.4    1
4 302332.6 820625.2    1
5 291470.3 822625.6    1
6 293429.5 818944.1    1

now we employ the magic of sf to just add coordinates to the data frame so we can access them on-and-off for the rest of the practical

sciurusSF=st_as_sf(sciurusData,coords=c("x","y"),crs="EPSG:27700")

Re-classifying the raster

In order to explore how occurrence responds to broadleaf forest cover, we first need to extract a layer from our LCM raster consisting only of cells assigned to this land cover type in the raster (class ‘1’). we can do this through a simple reclassification of the raster. First we convert the values of the raster layers to factors. In R factors represent categories (categorical data). You will hopefully have realized that our land cover map is a categorical rather than continuous raster, meaning that the values represent land cover types (rather than some continous variable such as elevation or slope - we’ll look at these later in the course).

#access levels of the raster by treating them as categorical data ('factors' in R)

LCM<-as.factor(LCM)

levels(LCM)

[[1]]
   ID LCMUK_1
1   0       0
2   1       1
3   2       2
4   3       3
5   4       4
6   5       5
7   7       7
8   8       8
9   9       9
10 10      10
11 11      11
12 12      12
13 13      13
14 14      14
15 15      15
16 16      16
17 17      17
18 18      18
19 19      19
20 20      20
21 21      21

Looking at this list we can create a binary raster (assigning 1 for broadleaf woodland - the second element in the list - and 0 for everything else).

To reclassify a raster we create a two-column matrix where the first column contains the original cell values and the second column contains our new values.

#create an vector object called reclass
reclass <- c(0,1,rep(0,19))

# combine with the LCM categories into a matrix of old and new values.
RCmatrix<- cbind(levels(LCM)[[1]],reclass)

RCmatrix<-RCmatrix[,2:3]

#apply function to make sure new columns are numeric (here the "2" specifies that we want to apply the as.numeric function to columns, where "1" would have specified rows)
RCmatrix=apply(RCmatrix,2,FUN=as.numeric)
#Use the reclassify() function to asssign new values to LCM with our reclassification matrix
RCmatrix

      LCMUK_1 reclass
 [1,]       0       0
 [2,]       1       1
 [3,]       2       0
 [4,]       3       0
 [5,]       4       0
 [6,]       5       0
 [7,]       7       0
 [8,]       8       0
 [9,]       9       0
[10,]      10       0
[11,]      11       0
[12,]      12       0
[13,]      13       0
[14,]      14       0
[15,]      15       0
[16,]      16       0
[17,]      17       0
[18,]      18       0
[19,]      19       0
[20,]      20       0
[21,]      21       0

broadleaf <- classify(LCM, RCmatrix)

plot(broadleaf)

Inspect the new data visually:

plot(broadleaf)

plot(sciurusFin,add=TRUE)

Buffer Analysis

Before we create a function to implement our analysis for all points, let’s first work with the first point (the first row of our sciurusFin SpatialPoints layer) to understand the buffering process and visualize the result.

#create an object to hold the distance parameter for our buffer
buf5km<-5000 

# use the st_buffer() function from the sf package applied to the first item of sciurusFin

buffer.site1.5km<-st_buffer(sciurusSF[1,],dist=buf5km)

Now use the zoom() function for a close up look at our buffered area (this will open in a separate window).

zoom(broadleaf,buffer.site1.5km) # use the zoom() function for a close-up of the result.

plot(buffer.site1.5km$geometry,border="red",lwd=2,add=T) # add the buffer

#dev.off() # enter this to clear the active graphics pane

Now that we have our buffer we’ll calculate the percentage cover by broadleaf woodland inside it.

To do so for a buffer width of 5km around our first site as an example we need to create the following objects:

a raster object for the broadleaf woodland inside each buffer (bufferlandcover5km below)

a numeric to hold the buffer area for a given radius in ha (bufferArea)

a numeric to hold the total area of woodland inside the buffer (landcover5km)

an object containing the final result as a percentage of the buffered area (percentlandcover5km )

Note that we use the crop() and mask() functions here. The first will clip our data to the extent (i.e. a bounding box delineated by the min and max x,y coordinates) of another object and the second clips to the mask (think of this as the outline) of another object. We could just use mask() directly, but this would take more time so putting both one after the other speeds up the analysis a little.

buffer5km <- crop(broadleaf, buffer.site1.5km)#crop the broadleaf layer to the extent of the buffer 
bufferlandcover5km <- mask(broadleaf, buffer.site1.5km)#clip the above again to the circle described by the buffer (doing this speeds up the process compared to using only the mask() function)


bufferArea <- (3.14159*buf5km^2) #calculate the area of the buffer according to the buffer width

landcover5km <- sum(values(bufferlandcover5km),na.rm=T)*625 #total area of woodland inside the buffer (625 is the area in metres of each cell of our 25x25m raster) na.rm=T makes sure any NA values are removed from the calculation (otherwise NA is returned) 

percentlandcover5km <- landcover5km/bufferArea*100 #calculate percentage

#return the result
percentlandcover5km

[1] 9.088551

Now that we are happy with the work flow above, let’s create a function to automate the application of these steps to all rows in our sciurusSF data frame.

Creating your own function in R is easy. We simply need to assign a name to our function (we’ll call it ‘landBuffer’), use ‘function()’ to list the arguments that the function requires, then place the associated code (which will produce the objects we just created in the example above) inside two braces {}.

Let’s set an example (we haven’t looked at the abs() function yet,a common term found in many statistical packages - can you work out what it does?).

#Create two numerical vectors to work with
v1<-c(1,2,3,4,5,6)

v2<-c(6,7,8,9,10,11)

#create a function called meanDiff() that will take two objects (vectors), calculate the mean of both and reurn the difference between the mean values:

meanDiff<-function(A,B){
  c<-abs(mean(A)-mean(B))
  print(c)
}

meanDiff(v1,v2)

[1] 5

A function to automate our buffer calculations. Note that in this version, in contrast to the original percent woodland calculation above, we use extract() function to extract all raster values as a new object and sum those values. This gives us the number of woodland cells in each buffer and, multiplying by the cell area, gives us the area of woodland in each buffer. Can you think why this is preferable to simply masking out all the cells with the buffer and then dividing by the area of the buffer as in the example above?

#function for automating whole dataset. The function is set up take two arguments: a data frame and a series of buffer distances.

landBuffer <- function(speciesData, r){         
  
  #buffer each point
  sciurusBuffer <- st_buffer(speciesData, dist=r)                     
  
  #crop the woodland layer to the buffer extent
  bufferlandcover <- crop(broadleaf, sciurusBuffer)              
    
  # now extract the raster values (which should all be 1 for woodland and 0 for everything else) within each buffer and sum to get number of woodland cells inside the buffers.
  masklandcover <- extract(bufferlandcover, sciurusBuffer,fun="sum")      
#get woodland area (625 is the area in metres of each cell of our 25m raster)
  landcoverArea <- masklandcover$LCMUK_1*625  
  
  # convert to precentage cover (we use the st_area() function from the sf package to get the area of our buffer) but convert to a numeric object (because sf applies units i.e. metres which then cant be entered into numeric calculations)
  percentcover <- landcoverArea/as.numeric(st_area(sciurusBuffer))*100 
  
  return(percentcover)                                        # return the result
}

Now we have our function it just remains to apply this function to every row in our sciurusData data frame for a series of buffer distances. We do this with a for loop that will iterate our function over each radius value.

Before we do that, let’s have a look at how for loops work in R. They are called in a similar way as calling any other function and take the format:

for(value in the sequence){ statement }

Try the below as an example:

#example for loop

#Create a vector of values
v1<-c(1,2,3,4,5,6,7,8,9,10,11)

#create a new empty vector to store the results of our for loop

x<-rep(NA,length=length(v1))

for (i in v1){
  x[i]<-(sum(v1)-i)
}

print(x)

 [1] 65 64 63 62 61 60 59 58 57 56 55

In our case the sequence of values is the sequence of buffer radii we wish to try. The statement is the application of our function for each of the distances. Note: once run this loop may take several minutes to complete so please be patient!

We set the sequence of radii to provide by using the seq() function:

radii<-seq(100,2000,by=100)

Now we modify the function we made earlier to make it specific to our data so that for each row i we apply the buffer j (i.e. each number in the object ‘radii’). This time the landBuffer function will only have one argument, i, as we will enter the chosen radii j directly into the main body of the landBuffer function (as the second argument of the buffer() function). I have left a line above return(percentcover)} below. If you type “print(i)” (without the speech marks) in this empty line then the function will print each iteration of i to give you some idea of progress (there are 1316 rows in sciurusData).

So, we will iterate over a range of values for the buffer radii using a for loop, apply our function and return the results to a data frame. Following code applies the function to all points in our data frame and returns the results to a list. At the end of this for loop we gather the list

resList=list()

for(i in radii){
  res.i=landBuffer(speciesData=sciurusSF,r=i)
  res.i
  resList[[i/100]]=res.i
  print(i)
  
  }

[1] 100
[1] 200
[1] 300
[1] 400
[1] 500
[1] 600
[1] 700
[1] 800
[1] 900
[1] 1000
[1] 1100
[1] 1200
[1] 1300
[1] 1400
[1] 1500
[1] 1600
[1] 1700
[1] 1800
[1] 1900
[1] 2000

resFin=do.call("cbind",resList)

glmData=data.frame(resFin)

colnames(glmData)=paste("radius",radii,sep="")

Take a look at the results…

#inspect


head(glmData)

  radius100 radius200 radius300 radius400   radius500 radius600 radius700
1  51.74900  34.33347 25.211051 24.630533 21.17728256 20.235185 20.675228
2   0.00000   0.00000  0.000000  0.000000  0.07961384  1.161035  1.584153
3  37.81658  28.36243 21.009209 16.669149 16.00238269 13.103112 11.657741
4  23.88415  21.39622 13.047824 11.444490  8.20022595  8.956557  8.530055
5   0.00000   0.00000  4.422991  4.478279  3.26416761  2.598507  3.127687
6   0.00000   0.00000  0.000000  0.000000  0.00000000  0.000000  0.000000
  radius800  radius900 radius1000 radius1100 radius1200 radius1300 radius1400
1 19.872362 19.9526054  21.296703  22.453078  23.856510  25.450520  25.519080
2  1.865949  2.6046505   3.144747   3.273379   3.165203   3.132734   3.269854
3 12.004275 11.9420766  11.185745  10.313612   9.509431   9.245099   8.418352
4  7.494897  6.8310644   6.727370   6.941538   6.358050   6.383240   6.417851
5  3.514205  5.6270279   7.284667   8.520655   8.085781   7.525628   6.844353
6  0.000000  0.8600261   2.547643   4.326124   7.740235  10.870352  13.272359
  radius1500 radius1600 radius1700 radius1800 radius1900 radius2000
1  25.211051  25.135894  24.607289  23.816580  22.803522  22.192359
2   3.520701   3.584178   3.643229   3.667397   3.721561   3.567695
3   7.696005   7.012860   6.377372   5.805176   5.221213   4.756927
4   7.324474   7.533771   7.499955   7.832381   8.126787   7.807133
5   6.448721   6.421976   6.315389   6.063184   5.871797   5.548090
6  15.560084  18.185233  20.006766  21.144356  21.513381  21.236993

#add in the presences data
glmData$Pres<-sciurusData$Pres


head(glmData)

  radius100 radius200 radius300 radius400   radius500 radius600 radius700
1  51.74900  34.33347 25.211051 24.630533 21.17728256 20.235185 20.675228
2   0.00000   0.00000  0.000000  0.000000  0.07961384  1.161035  1.584153
3  37.81658  28.36243 21.009209 16.669149 16.00238269 13.103112 11.657741
4  23.88415  21.39622 13.047824 11.444490  8.20022595  8.956557  8.530055
5   0.00000   0.00000  4.422991  4.478279  3.26416761  2.598507  3.127687
6   0.00000   0.00000  0.000000  0.000000  0.00000000  0.000000  0.000000
  radius800  radius900 radius1000 radius1100 radius1200 radius1300 radius1400
1 19.872362 19.9526054  21.296703  22.453078  23.856510  25.450520  25.519080
2  1.865949  2.6046505   3.144747   3.273379   3.165203   3.132734   3.269854
3 12.004275 11.9420766  11.185745  10.313612   9.509431   9.245099   8.418352
4  7.494897  6.8310644   6.727370   6.941538   6.358050   6.383240   6.417851
5  3.514205  5.6270279   7.284667   8.520655   8.085781   7.525628   6.844353
6  0.000000  0.8600261   2.547643   4.326124   7.740235  10.870352  13.272359
  radius1500 radius1600 radius1700 radius1800 radius1900 radius2000 Pres
1  25.211051  25.135894  24.607289  23.816580  22.803522  22.192359    1
2   3.520701   3.584178   3.643229   3.667397   3.721561   3.567695    1
3   7.696005   7.012860   6.377372   5.805176   5.221213   4.756927    1
4   7.324474   7.533771   7.499955   7.832381   8.126787   7.807133    1
5   6.448721   6.421976   6.315389   6.063184   5.871797   5.548090    1
6  15.560084  18.185233  20.006766  21.144356  21.513381  21.236993    1

Next, we use these data to model Sciurus vulgaris occurrence as a function of woodland cover at each scale with the general linear model function: glm(). In glm() we state the dependent variable as the first argument followed by tilda (~), we give “binomial” to indicate our dependent variable is binary (i.e. 0 or 1)

#make an empty data frame to then add a sequence of results from glms (general linear models) that test the relationship between broadleaf cover and red squirrel presence for the different buffer sizes

#init empty data frame
glmRes=data.frame(radius=NA,loglikelihood=NA)

#for loop to iterate over radius values
for(i in radii){
  
  n.i=paste0("Pres~","radius",i,sep ="")
  
  glm.i=glm(formula(n.i),family = "binomial",data = glmData)
  
  ll.i=as.numeric(logLik(glm.i))
  
  glmRes=rbind(glmRes,c(i,ll.i))

  }

#inspect
head(glmRes)

  radius loglikelihood
1     NA            NA
2    100     -604.4495
3    200     -599.9888
4    300     -605.5786
5    400     -613.6037
6    500     -614.8780

…and plot

plot(glmRes$radius, glmRes$loglikelihood, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "buffer", ylab = "logLik")

#now, by finding the max loglikelihood we can determine the optimum buffer size

#remove the NAs in the first row
glmRes=glmRes[!is.na(glmRes),]

#use the which.max function to subset the dataframe to the just the row containing the max log likelihood value

opt<-glmRes[which.max(glmRes$loglikelihood),]

#print
opt

   radius loglikelihood
19   1800     -581.1424

Our results give us some confidence in the characteristic scales that we need to consider for the response of red squirrel to broadleaf woodland. Our data suggests a bimodal response with a strong signal at 100m and an overall highest loglikelihood value at 1800m.

You’ve reached the end of this week’s practical. Please continue to the exercise below.

Week One independent exercises

Open a new script and execute the following in R.

Calculate the characteristic scale(s) of the response by the Eurasian badger (Meles meles) to broadleaf tree cover for buffer distances of between 100 and 2000m in 100m intervals. Using the log likelihood statistic for model selection, calculate the optimum scale for modelling badger response to broadleaf tree cover for the same extent as that used in today’s practical. Use the occurence data in the “Melesmeles.csv” file in the GEOG71922 shared drive (on dropbox). Remove any records that are “unconfirmed” (under the “Identification verification status” field in the csv file) and any points with coordinate uncertainty > 1000m. Produce plots showing the relationship between buffer radii and your model performance (log likelihood statistic).